8 research outputs found

    Potential and limitations of cross-domain sentiment classification

    Get PDF
    In this paper we investigate the cross-domain performance of sentiment analysis systems. For this purpose we train a convolutional neural network (CNN) on data from different domains and evaluate its performance on other domains. Furthermore, we evaluate the usefulness of combining a large amount of different smaller annotated corpora to a large corpus. Our results show that more sophisticated approaches are required to train a system that works equally well on various domains

    Twist Bytes : German dialect identification with data mining optimization

    Get PDF
    We describe our approaches used in the German Dialect Identification (GDI) task at the VarDial Evaluation Campaign 2018. The goal was to identify to which out of four dialects spoken in German speaking part of Switzerland a sentence belonged to. We adopted two different metaclassifier approaches and used some data mining insights to improve the preprocessing and the meta-classifier parameters. Especially, we focused on using different feature extraction methods and how to combine them, since they influenced the performance very differently of the system. Our system achieved second place out of 8 teams, with a macro averaged F-1 of 64.6%. We also participated on the surprise dialect task with a multi-label approach

    A methodology for creating question answering corpora using inverse data annotation

    Get PDF
    In this paper, we introduce a novel methodology to efficiently construct a corpus for question answering over structured data. For this, we introduce an intermediate representation that is based on the logical query plan in a database, called Operation Trees (OT). This representation allows us to invert the annotation process without loosing flexibility in the types of queries that we generate. Furthermore, it allows for fine-grained alignment of the tokens to the operations. Thus, we randomly generate OTs from a context free grammar and annotators just have to write the appropriate question and assign the tokens. We compare our corpus OTTA (Operation Trees and Token Assignment), a large semantic parsing corpus for evaluating natural language interfaces to databases, to Spider and LC-QuaD 2.0 and show that our methodology more than triples the annotation speed while maintaining the complexity of the queries. Finally, we train a state-of-the-art semantic parsing model on our data and show that our dataset is a challenging dataset and that the token alignment can be leveraged to significantly increase the performance

    spMMMP at GermEval 2018 Shared Task: Classification of Offensive Content in Tweets using Convolutional Neural Networks and Gated Recurrent Units

    No full text
    In this paper, we propose two different systems for classifying offensive language in micro-blog messages from Twitter (”tweet”). The first system uses an ensemble of convolutional neural networks (CNN), whose outputs are then fed to a meta-classifier for the final prediction. The second system uses a combination of a CNN and a gated recurrent unit (GRU) together with a transfer-learning approach based on pretraining with a large, automatically translated dataset

    spMMMP at GermEval 2018 Shared Task: Classification of Offensive Content in Tweets using Convolutional Neural Networks and Gated Recurrent Units

    No full text
    In this paper, we propose two different systems for classifying offensive language in micro-blog messages from Twitter (”tweet”). The first system uses an ensemble of convolutional neural networks (CNN), whose outputs are then fed to a meta-classifier for the final prediction. The second system uses a combination of a CNN and a gated recurrent unit (GRU) together with a transfer-learning approach based on pretraining with a large, automatically translated dataset

    spMMMP at GermEval 2018 shared task : classification of offensive content in tweets using convolutional neural networks and gated recurrent units

    No full text
    In this paper, we propose two different systems for classifying offensive language in micro-blog messages from twitter (”tweet”). The first system uses an ensemble of convolutional neural networks (CNN), whose outputs are then fed to a meta-classifier for the final prediction. The second system uses a combination of a CNN and a gated recurrent unit (GRU) together with a transfer-learning approach based on pretraining with a large, automatically translated dataset

    Best practices in e-assessments with a special focus on cheating prevention

    No full text
    In this digital age of the computer, Internet, and social media and Internet of Things, e-assessments have become an accepted method to determine if students have learned materials presented in a course. With acceptance of this electronic means of assessing students, many questions arise about this method. What should be the format of e-assessment? What amount of time? What kinds of questions should be asked (multiple choice, short answer, etc.)? These are only a few of the many different questions. In addition, educators have always had to contend with the possibility that some students might cheat on an examination. It is widely known that students are often times more technologically savvy than their professors. So how does one prevent students from cheating on an e-assessment? Understandably, given the amount of information available on e-assessments and the variety of formats to choose from, choosing to administer e-assessments over paper-based assessments can lead to confusion on the part of the professor. This paper presents helpful guidance for lecturers who want to introduce e-assessments in their class, and it provides recommendations about the technical infrastructure to implement to avoid students cheating. It is based on literature review, on an international survey that gathers insights and experiences from lecturers who are using e-assessment in their class, and on technological evaluation of e-assessment infrastructure
    corecore